Skip to content

flash-attn2: Add flash_attn_with_kvcache support for XPU#534

Open
YangKai0616 wants to merge 7 commits intohuggingface:mainfrom
YangKai0616:fa-kvcache
Open

flash-attn2: Add flash_attn_with_kvcache support for XPU#534
YangKai0616 wants to merge 7 commits intohuggingface:mainfrom
YangKai0616:fa-kvcache

Conversation

@YangKai0616
Copy link
Copy Markdown
Contributor

Functionality check passed, still need to test the performance.

@YangKai0616
Copy link
Copy Markdown
Contributor Author

In transformers, PR huggingface/transformers#44379 enables the flash_attention_with_kvcache feature to speed up performance for paged+decode cases. This PR adds that same feature for XPU, which gives close to a 2x performance boost.

@YangKai0616 YangKai0616 marked this pull request as ready for review May 9, 2026 08:38
@YangKai0616 YangKai0616 requested a review from drbh as a code owner May 9, 2026 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant